Skip to content

Add optional "Show Cost (USD)" toggle to the leaderboard#42

Open
lakshvantb wants to merge 8 commits into
mainfrom
feature/cost-display
Open

Add optional "Show Cost (USD)" toggle to the leaderboard#42
lakshvantb wants to merge 8 commits into
mainfrom
feature/cost-display

Conversation

@lakshvantb

Copy link
Copy Markdown
Contributor

What

Adds a Show Cost (USD) toggle to the leaderboard. When on, each score cell gains a muted second line with cost, mirroring how scores are shown at every level (model total / category / subtask). Off by default — the table is byte-for-byte unchanged unless toggled.

![behavior](cost shows under each score; uncovered models show a single n/a)

Show cost: ☑
 Model            Global Avg   Reasoning   Coding   Agentic Coding
 Claude Opus 4.8   78.8         80.7        71.7     75.0
                   $45.21       $1.48       $1.16    $38.00     ← covered: per-cell USD totals
 Some-OSS-70b      54.2         55.1        48.0     49.9
                   n/a                                          ← not in cost file: single n/a

How it works

  • Data: a parallel, feature-detected public/cost_<date>.csv with the same header shape as table_<date>.csv (first col model, then the task columns) but holding USD totals per task.
  • Roll-up = SUM (totals, not means): subtask → category → model. Totals are additive, so the per-task costs sum to the category total and categories sum to the model total (the global cell).
  • Coverage is data-driven: a model shows cost iff it has a row in the cost file. Everyone else (including non-best variants) shows a single muted n/a — in the Global Average cell, or the first metric cell when only one category is selected. No allowlist or "top-N" logic in the UI.
  • Composes with every existing control (category average/subcategories, Show Variants, row filters, sort, organization/API-name) with no new branches — cost renders inside the cells those controls already build.
  • Zero impact until data lands: dates with no cost_<date>.csv hide the toggle and render exactly as today (fetch is feature-detected; guards against SPA 200-with-index.html fallbacks).
  • c/cost=true URL param, consistent with the other display toggles; reset by Clear Filters.

Files

  • src/Table/Averaging.jssumColumns() (sum twin of calculateAverage)
  • src/Table/CSVTable.jsx — state + URL param, cost fetch, checkbox (only when a cost file is loaded), per-cell render, reset
  • src/App.css.cost styling (muted, tabular-nums; hidden < 600px)

Activation

Drop a cost_<date>.csv (same model keys as table_<date>.csv, USD totals per task — key by the displayed model name so variant-collapse matches) into public/. Only those models show cost; the rest show n/a. Initial rollout will populate the top models from a rerun; the file is sparse by construction.

Testing

  • npm run build compiles cleanly (only pre-existing lint warnings).
  • Verified roll-up math against a sample cost file + the real categories_2026_01_08.json: category totals and the global total reconcile (e.g. Agentic Coding $38.00, python subtask $14.00, global $45.21); uncovered models render a single n/a.
  • Adversarially reviewed for regressions: no-cost-file dates unchanged, no URL/render loop, no key-casing mismatch, fetch rejects HTML fallbacks.

🤖 Generated with Claude Code

Surfaces per-task / per-category / per-model cost beside scores, mirroring how
scores are displayed. Cost is a muted second line inside each existing metric
cell; OFF by default, so the table is unchanged unless toggled on.

Data is a parallel, feature-detected public/cost_<date>.csv with the SAME header
shape as table_<date>.csv but USD totals per task. Cost rolls up by SUM (totals,
not means): subtask -> category -> model. Models present in the cost file show
totals; every other model shows a single "n/a" (in the Global Average cell, or
the first metric cell when one category is selected). Dates without a cost file
hide the toggle entirely and render exactly as before.

- Averaging.js: add sumColumns() (sum twin of calculateAverage)
- CSVTable.jsx: showCost state + `cost` URL param, feature-detected cost fetch,
  Show Cost checkbox (only when a cost file is loaded), per-cell cost render,
  reset in Clear Filters
- App.css: .cost styling (muted, tabular-nums; hidden < 600px)

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
@lakshvantb lakshvantb force-pushed the feature/cost-display branch from 6cec09a to dd72fdb Compare June 18, 2026 17:04
lakshvantb and others added 7 commits June 18, 2026 17:28
TEST DATA for exercising the Show Cost (USD) feature — real per-task USD totals
computed from verified input+output tokens x cost_per_million (or stored cost_usd)
for 18 board models with correct token tracking. Normal tasks only, so the three
Agentic Coding columns (javascript/python/typescript) render "—" for covered models.
Models without pricing or not on the board show "n/a". Replace with official
rerun output before any production use.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
…le-count)

Addresses verification findings: csv.writer was emitting CRLF (table uses LF), and
the question_id->task join cross-labeled consecutive_events vs integrals_with_game.
Now assigns task by directory and keeps the latest-run version per (model,task).

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
.other-controls now uses flex-wrap; previously the row overflowed horizontally and
the rightmost toggles (Show High Unseen Bias, the new Show Cost) were clipped.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
.other-controls is now centered (justify-content: center) so the wrapped toggle
line is balanced; Clear Filters moved into a centered .clear-filters-row below the
toggles instead of sitting inline.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
.table-container is a column flex with align-items: flex-start, so child rows
shrank to content width and pinned left. align-self: stretch makes the rows full
width so justify-content: center actually centers Clear Filters (and the toggles).
- Clear cost state when the date changes so a model never briefly shows the previous
  date's cost while the new cost file loads.
- Replace the dense per-cell cost ternary with a small costCell(columns, isAnchor)
  helper; behavior identical, far more readable.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant